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Abstract 

A number of problems in quantum state and system identification are addressed. Specif- 
ically, it is shown that the maximum likelihood estimation (MLE) approach, already known 
to apply to quantum state tomography, is also applicable to quantum process tomography 
(estimating the Kraus operator sum representation (OSR)), Hamiltonian parameter estima- 
tion, and the related problems of state and process (OSR) distribution estimation. Except 
for Hamiltonian parameter estimation, the other MLE problems are formally of the same 
type of convex optimization problem and therefore can be solved very efficiently to within 
any desired accuracy. 

Associated with each of these estimation problems, and the focus of the paper, is an 
optimal experiment design (OED) problem invoked by the Cramer-Rao Inequality: find the 
number of experiments to be performed in a particular system configuration to maximize 
estimation accuracy; a configuration being any number of combinations of sample times, 
hardware settings, prepared initial states, etc.. We show that in all of the estimation prob- 
lems, including Hamiltonian parameter estimation, the optimal experiment design can be 
obtained by solving a convex optimization problem. ' 
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Introduction 



"In a machine such as this [a quantum computer] there are very many other problems due to 
imperfections. For example, in the registers for holding the data, there will be problems of 
cross-talk, interactions between one atom and another in that register, or interaction of the 
atoms in that register directly with things that are happening along the program line that 
we didn 't exactly bargain for In other words, there may be small terms in the Hamiltonian 
besides the ones we've written. Until we propose a complete implementation of this, it is 
very difficult to analyze. At least some of these problems can be remedied in the usual way 
by techniques such as error correcting codes and so forth, that have been studied in normal 
computers. But until we find a specific implementation for this computer, I do not know how 
to proceed to analyze these effects. However, it appears that they would be very important 
in practice. This computer seems to be very delicate and these imperfections may produce 
considerable havoc." 

- Richard P. Feynman, "Quantum Mechanical Computers," Optics News, February 1985. 

1.1 Alleviating the "havoc" 

The concerns heralded by Feynman remain of concern today in all the implementations envi- 
sioned for quantum information systems. In a quantum computer it is highly likely that in order 
to achieve the desired system objectives, these systems will have to be tuned, or even entirely 
determined, using estimated quantities obtained from data from the actual system rather than 
solely relying on an initial design from a theoretical model. The problem addressed here is to 
design the experiment in order to yield the optimum information for the intended purpose. This 
goal is not just limited to quantum information systems. It is an essential step in the engineer- 
ing practice of system identification [22, Ch.l4]. That is, the design of the experiment which 
gives the best performance against a given set of criteria, subject to constraints reflecting the 
underlying properties of the system dynamics and/or costs associated with the implementation 
of certain operations or controls. 

Clearly each application has a specific threshold of performance. For example, the require- 
ments in quantum chemistry are generally not as severe as in quantum information systems. The 
objective of a measurement, therefore, depends on the way in which information is encoded into 
the system to begin with, and this is in turn, depends on the application. In this paper we are 
concerned with estimating quantum system properties: the state, the process which transforms 
the state, and parameters in a Hamiltonian model. 

The estimation of the state of a quantum system from available measurements is generally 
referred to as quantum state tomography about which there is extensive literature on both theo- 
retical and experimental aspects, e.g., see [27, Ch.8], [15] and the references therein. The more 
encompassing procedure of quantum system identification is not so easily categorized as the 
nomenclature (and methodology) seems to depend on the type and/or intended use of the iden- 
tified model. For example, quantum process tomography (QPT) refers to determining the Kraus 
operator-sum-representation (OSR) of the input state to output state (completely positive) map, 
e.g., [27, §8.4.2], [7]. Hamiltonian parameter estimation refers to determining parameters in a 
model of the system Hamiltonian, e.g., [24], [6], [12], [37]. Somewhere in between quantum 
process tomography and Hamiltonian parameter estimation is mechanism identification which 
seeks an estimate of population transfer between states as the system evolves, e.g., [25]. 
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Maximum likelihood estimation (MLE), a well established method of parameter estima- 
tion which is used extensively in current engineering applications, e.g., [22], was proposed in 
[4, 29] and [33] for quantum state tomography of a quantum system with non-continuing mea- 
surements, i.e., data is taken from repeated identical experiments. Also, as observed in [29, 33], 
the MLE of the density matrix is a convex optimization problem. 

In this paper we address the related problem of optimal experiment design (OED) so as 
to secure an estimate of the best quality. The approach presented relies on minimizing the 
Cramer-Rao lower bound [8] where the design parameters are the number of experiments to be 
performed while the system is in a specified configuration. En route we also show that many 
related problems in state and process tomography can also be solved using MLE, and moreover, 
they are all formally the same type of convex optimization problem, namely, a determinant max- 
imization problem, referred to as a maxdet problem [5, 36]. Similarly, the OED problem posed 
here is also of a single general type of convex optimization problem, namely, a semidefinite 
program (SDP). 

Convexity arises in many ways in quantum mechanics and this is briefly discussed in §1.2. 
The great advantage of convex optimization is a globally optimal solution can be found effi- 
ciently and reliably, and perhaps most importantly, can be computed to within any desired ac- 
curacy. Achieving these advantages, however, requires the use of specialized numerical solvers. 
As described in §1.3, the appropriate convex solvers have been embedded in some software 
tools we have composed which can solve the MLE and OED problems presented here. 

In the remainder of the paper we present both MLE and the corresponding OED as applied 
to: quantum state tomography (MLE in §2.2 and OED in §2.3), estimating the distribution of 
known input states (MLE in §2.5 and OED in §2.6), quantum process tomography using the 
Kraus operator sum representation (MLE in §3.1 and OED in §3.2), estimating the distribution 
of a known OSR set (MLE in §3.4 and OED in §3.5), and to Hamiltonian parameter estimation 
(MLE in §4.1 and OED in §4.2). A summary in table form is presented in §5 followed by 
a discussion in §6 of the relation of MLE and OED to iterative adaptive control of quantum 
systems. 

1.2 Convexity and quantum mechanics 

Many quantum operations form convex sets or functions. Consider, for example, the following 
convex sets which arise from some of the basic aspects of quantum mechanics: 

probability outcomes {pa G R} EaPa = 1, Pa > 

density matrix {p e C"''"} Tr p = 1, p > 



positive operator 
valued measure (POVM) 

operator sum 
representation (OSR) 
in fixed basis 

{Si e C"^" 1^ = } 



[x e C"'x"'} Eij X,, B*B^ = /„, X > 
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An example of a convex function relevant to quantum information is worst-case gate fidelity, a 
measure of the "distance" between two unitary operations on the same input. As pointed out in 
[13], there are many ways to define this measure. Consider, for example, 

r^(t/dcs, f/act) = ,min |(?7des^)* (f/actV^)!' (D 
\\^\=l 

where U^cs £ C"^" is the desired unitary and f/act G C"^" is the actual unitary. In this case 
the worst-case fidelity can be interpreted as the minimum probability of obtaining the desired 
output state f/desV^ over all possible pure input states which produce the actual output state 
Uactip- If Udes and [/act differ by a scalar phase then the worst-case fidelity is clearly unity; 
which is consistent with the fact that a scalar phase cannot be measured. This is not the case for 
the error norm \\Udcs - f/act || • 

As shown in Appendix §A. 1 , obtaining the worst-case fidelity requires solving the following 
(convex) quadratic programming (QP) problem: 

minimize z^{aa^ + hlF)z 

subject to ELi Zk = l, Zk>Q ^' 
with the vectors a, h in R" the real and imaginary parts, respectively, of the eigenvalues of the 
unitary matrix Ul^JJ^^t, that is, a = Re e\g{Ul^JJ^ct), b = Im eig{U^^^U^ct)- In some cases it 
is possible to compute the worst-case fidelity directly, e.g., in the example in Section §4.3 and 
in some examples in [27, §9.3]. Although the optimal objective value /"'^(f/des, f^act) is global, 
the optimal worst-case state which achieves this value is not unique. 

In addition to these examples, convex optimization has been exploited in [3] and [33] in an 
attempt to realize quantum devices with certain properties. In [9] and [21], convex optimization 
is used to design optimal state detectors which have the maximum efficiency. 

In general, convex optimization problems enjoy many useful properties. From the introduc- 
tion in [5], and as already stated, the solution to a convex optimization problem can be obtained 
to within any desired accuracy. In addition, computation time does not explode with problem 
size, stopping criteria always produce a lower bound on the solution, and if no solution can be 
found a proof of infeasibility is provided. There is also a complete duality theory which can 
yield more efficient computation as well as optimality conditions. This is explored briefly in 
Section §2.3. 

1.3 Software for tomography & experiment design 

We have composed some MATLAB m-files which can be used to solve a subset of the QPT and 
OED convex optimization problems presented here. The examples shown here were generated 
using this software. The software, available upon request from the first author, requires the 
convex solvers YALMIP [23] and SDPT3 [35] which can be downloaded from the internet. 
These solvers make use of interior-point methods for solving convex optimization problems, 
e.g.,[5,Ch.ll],[26]. 

2 Quantum State Tomography 

Consider a quantum system which has riout distinct outcomes, labeled by the index a, a = 
1, . . . , nout, and which can be externally manipulated into distinct configurations, labeled 
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by the index 7,7 = 1,..., ncfg. Configurations can include wave-plate angles for photon count- 
ing, sample times at which measurements are made, and settings of any experimental "knobs" 
such as external control variables, e.g., laser wave shape parameters, magnetic field strengths, 
and so on. For quantum process tomography (§3.1) and Hamiltonian parameter estimation 
(§4.1), configurations can also include distinctly prepared initial states. 

The problem addressed in this section is to determine the minimum number of experiments 
per configuration in order to obtain a state estimate of a specified quality, i.e., what is the 
tradeoff between number of experiments per configuration and estimation quality. The method 
used to solve this problem is based on minimizing the size of the Cramer-Rao lower bound on 
the estimation error [8]. 

2.1 Data collection 

The data is collected using a procedure referred to here as non-continuing measurements. Mea- 
surements are recorded from identical experiments in each configuration 7 repeated £^ times. 
The set-up for data collection is shown schematically in Figure 1 for configuration 7. 



^truc ^ oyixn 



true 



7 




a 



POVM 

1 , . . . , TT-out 



Outcome counts 
ria^, trials 

Q. 1, . . . , JT-out 



Figure 1: System/POVM. 

Here p^™^ E C"^" is the true, unknown state to be estimated, o"^™*^ G C"^" is the reduced 
density matrix which captures all the statistical behavior of the Q-system under the action of 
the measurement apparatus, and ria-f is the number of times outcome a is obtained from the £^ 
experiments. Thus, 



E 



n. 



07 



''7; •''expt 



E 



(3) 



where ^expt is the total number of experiments. The data set consists of all the outcome counts. 



D = {n. 



07 



a 



1, . . . ,nout,7 = 1, • • • } 



(4) 



The design variables used to optimize the experiment are the non-negative integers repre- 
sented by the vector, 

^=[^l---^n,J (5) 

Let p^™'^ denote the true probability of obtaining outcome a when the system is in configuration 
7 with state input p*''"°. Thus, 

En„^ = VaT (6) 

where the expectation E(-) taken with repect to the underlying quantum probability distribu- 
tions. 

We pose the following model of the system. 



Pa-lip) = Tr Mo^a^(p) 



(7) 
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where Pa-Xp) is the outcome probability of measuring a when the system is in configuration 7 
with input state p belonging to the set of density matrices, 

[p e C"^" |p > 0, Trp = 1 } (8) 

{Ma'^} are the POVM elements of the measurement apparatus, and thus, for 7=1,..., ncfg, 

^ M,^ = /„, > 0, a = 1, . . . , nout (9) 

a 

and cr^(p) is the reduced density output state of the Q-system model. A general (model) 
representation of the Q system is the Kraus operator-sum-representation (OSR) which can 
account for many forms of error sources as well as decoherence [27]. Specifically, in con- 
figuration 7, the Q-system model can be parametrized by the set of Kraus matrices, = 
{ K^k e C"^" I A; = 1, . . . , } as follows: 

a,{p) = Q{p, K,) = Y: K,kpK;„ Yl K*,kK,k = In (10) 

k=l k=l 

with < n^. Implicit in this OSR is the assumption that the Q-system is trace preserving. 
Combining this with the measurement model (9) gives the model probability outcomes, 

Pa-i{p) = TrOayp, Oay = K*yk^a^K-ik (H) 

k=l 

In this model, the outcome probabilities are linear in the input state density matrix. Moreover, 
the set Oy = { Oaj \ a = 1, . . . , riout }, satisfies (9), and hence, is a POVM. ^ If the Q-system 
is modeled as a unitary system, then, 

ay{p) = Uypu;, u;u., = in^o^y = u;M^yUy (i2) 

The set is still a POVM; in effect the OSR has a single element, namely, = U^. 
System in the model set 

We make the following assumption throughout: the true system is in the model set. This means 
that, 

PaT = ^'«7(P*"") = Tr 0,,p*^'^^ (13) 

This is always a questionable assumption and in most engineering practice is never true. Relax- 
ing this assumption is an active research topic particularly when identification (state or process) 
is to be used for control design, e.g., see [19] and [34]. The case when the system is not in 
the model set will not be explored any further here except for the effect of measurement noise 
which is discussed next. It is important to emphasize that in order to produce an accurate unbi- 
ased estimate of the true density it is necessary to know the noise elements (as described next) 
which is a consequence of assumption (13). 

"In a more general OSR the Q-system need not be trace preserving, hence the Kraus matrices in (10) need not 
sum to identity as shown, but rather, their sum is bounded by identity. Then the set O-y is not a POVM, however, 
satisfies, J2a < In, Oa^ > 0, a = 1, . . . , nout 
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Noisy measurements 

Sensor noise can engender more noisy outcomes than noise-free outcomes. Consider, for ex- 
ample, a photon detection device with two photon-counting detectors. If both are noise-free, 
meaning, perfect efficiency and no dark count probability, then, provided one photon is always 
present at the input of the device, there are only two possible outcomes: {10, 01}. If, however, 
each detector is noisy, then either or both detectors can misfire or fire even with a photon always 
present at the input. Thus in the noisy case there are /oMr possible outcomes: {10, 01, 11, 00}. 

Let { Mq,^ I a = 1, . . . , riout } denote the noisy POVM and let | Mq,^ | a = 1, . . . , rlout | 
denote the noise-free POVM with riout > Tlout where, 

"out 

= Ua/s-r Mp^, a = 1, . . . , nout, 7 = 1, • • • , ?^cfg (14) 

13=1 

The {i^af3-f} represents the noise in the measurement, specifically, the conditional probability 
that a is measured given the noise-free outcome P with the system in configuration 7. Since 
J2a ^afS'y = 1, V/9, 7, it foUows that if the noise-free set is a POVM then so is the noisy set. 



2.2 Maximum likelihood state estimation 

The Maximum Likelihood (ML) approach to quantum state estimation presented in this section, 
as well as observing that the estimation is convex, can be found in [29], [37] and the references 
therein. Using convex programming methods, such as an interior-point algorithm for computa- 
tion, was not exploited in these references. 

If the experiments are independent, then the probability of obtaining the data (4) is a product 
of the individual model probabilities (7). Consequently, for an assumed initial state p, the model 
predicts that the probability of obtaining the data set (4) is given by, 

Prob{D,p} = nPa,(p)""^ (15) 

a, 7 

The data is thus captured in the outcome counts {n^^} whereas the model terms have a p- 
dependence. The function Prob {D, p} is called the likelihood function and since it is positive, 
the maximum likelihood estimate (MLE) of p is obtained by finding a p in the set (8) which 
maximizes the log-likelihood function, or equivalently, minimizes the negative log-likelihood 
function, 

L{D,p) = -log Prob {D,p} 

a,7 (iO) 

a, 7 

These expressions are obtained by combining (15), (16) and (11). The Maximum Likelihood 
state estimate, p^^, is obtained as the solution to the optimization problem: 

minimize L{D,p) = - J2a,'y ^a-y log Tr Oa-yP 

subject to p > 0, Tr p = 1 ^ ^ 

L(D, p) is a positively weighted sum of log-convex functions of p, and hence, is a log-convex 
function of p. The constraint that p is a density matrix forms a convex set in p. Hence, (17) is 
in a category of a class of well studied log-convex optimization problems, e.g., [5]. 
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Pure state estimation 



Suppose it is known the the true state is pure, that is, p = V^truc^truo wi'^h iptmc ^ and 
^trucV^tiuc = 1. In practice we have found that solving (17) when the true state is pure gives 
solutions which are easily approximated by pure states, that is, the estimated state has one 
singular value near one and all the rest are very small and positive. 

To deal directly with pure state estimation we first need to characterize the set of all density 
matrices which are pure. This is given by the set { p G C"^" | P > 0, rank p = 1 }, which is 
equivalent to, 

{p G C''^" |p > 0, Tip = 1, Tr p2 = 1 } (18) 
The corresponding ML estimate is then the solution of. 



minimize L(p) = — J2a,'y '^a-y log Tr O^^p 
subject to p > 0, Tr p = 1, Tr p2 = 1 



(19) 



This is not a convex optimization problem because the equality constraint, Tr p^ = 1, is not 
convex. However, relaxing this constraint to the convex inequality constraint, Tr p^ < 1, results 
in the convex optimization problem: 



minimize L{p) = - J^a^ '^a-i log Tr O^-yp 
subject to p > 0, Tr p = 1, Tr p2 < 1 



(20) 



If the solution is on the boundary of the set Tr p^ < 1, then a pure state has been found. There 
is however, no guaranty that this will occur. 

Least-squares (LS) state estimation 

In a typical application the number of trials per configuration, is sufficiently large so that the 
empirical estimate of the outcome probability, 

= ^ (21) 

is a good estimate of the true outcome probability p^^". The empirical probability estimate also 
provides the smallest possible value of the negative log-likelihood function, that is, p^^^ is the 
solution to, 

minimize L{p) = - Eq,7 ria^ log pa-y ^22) 
subject to J2a Pa-y = 1,V7, pocy > 0, Vo, 7 

with optimization variables Pa^, Va, 7. Thus, for any value of p we have the lower bound, 

- ^^"7 1°S ~P' - ~Y1 "'"7 l°g *^"7/' (23) 

a, 7 '''7 a, 7 

In particular, assuming (6) holds, and the trials are independent, then the variance of the 
empirical estimate is known to be [28], 

var/,7 = ^pl7(l-p*J-) (24) 
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It therefore follows that for large l^, ~ Pa^^^ and if as assumed (13), the system is in the 
model set, thenp^^!^^ = Tr Oa^p*™*^. These two conditions lead to taking the state estimate as 
the solution to the constrained weighted least- squares problem: 



The weights, Wj, are chosen by the user to emphasize different configurations. A typical choice 
is the distribution of experiments per configuration, hence, Wy > 0, J2-y Wj = 1. Because 
of the semi-definite constraint, this weighted-least-squares problem is a convex optimization in 
the variable p. For large i^, the solution ought to be a good estimate of the true state. There 
is, however, little numerical benefit in solving (25) as compared to (17) - they are both convex 
optimization problems and the numerical complexity is similar provided (17) is solved using an 
interior-point method [5]. Some advantage is obtained by dropping the semidefinite constraint 
p > in (25) resulting in. 



This is a standard least-squares problem with a linear equality constraint which can be solved 
very efficiently using a singular value decomposition to eliminate the equality constraint [14]. 
For sufficiently large the resulting estimate may satisfy the positivity constraint p > 0. If 
not, it is usually the case that some of the small eigenvalues of the state estimate or estimated 
outcome probabilities are slightly negative which can be manually set to zero. Solving (26) is 
numerically faster than solving (17), but not by much. Even with a large amount of data the 
solution to (26) can produce estimates which are not positive if the data is not sufficiently rich. 
In this case the estimates from any procedure which eliminates the positivity constraint can be 
very misleading. 

It thus appears that even for large i^, there is no significant benefit accrued, either because 
of numerical precision or speed, to using the empirical estimate followed by standard least- 
squares. If, however, the are not sufficiently large and/or the data is not sufficiently rich, then 
it is unlikely that the estimate from (26) will be accurate. 

One possible advantage does come about because the solution to (26) can be expressed 
analytically, and thus it is possible to gain an understanding of how to select the POVM. For 
example, in [27] special POVM elements are selected to essentially diagonalize the problem, 
thereby making the least-squares problem (26) simpler, i.e., the elements of the density matrix 
can be estimated one at a time. However, implementing the requisite POVM set may be very 
difficult depending on the physical apparatus involved. 

2.3 Experiment design for state estimation 

In this section we describe the experiment design problem for quantum state estimation. The 
objective is to select the number of experiments per configuration, the elements of the vector 
i = [ii - ■ ■ ^n^-fg]"^ £ R"<=fg, SO as to minimize the error between the state estimate, p{i), and the 




(25) 



subject to p > 0, Tr p = 1 




(26) 



subject to Tr p = 1 
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true state p . Specifically, we would like to solve for £ from: 



minimize E \\p{i) - p*"-"" 1 1 

subject to E7 ^7 = 4xpt (27) 
integer >0, 7 = 1,..., ridg 

where i^xpt is the desired number of total experiments. This is a difficult, if not insoluble prob- 
lem for several reasons. First, the solution depends on the estimation method which produces 
p{i). Secondly, the problem is integer combinatorial because £ is a vector of integers. And 
finally, the solution depends on p*™*^, the very state to be estimated. Fortunately all these issues 
can be circumvented. 

We first eliminate the dependence on the estimation method. The following result can be 
established using the Cramer-Rao Inequality [8]. The derivation is in Appendix §A.3. 

State estimation variance lower bound ^ 

Suppose the system generating the data is in the model set used for estimation, i.e., 
(13) holds. For £ = [£i- ■ ■ £n^f^] experiments per configuration, suppose p{£) is a 
density matrix and an unbiased estimate of p*™°, i.e., p{i) > 0, Tr p{£) = 1, and 
E p{£) = p^'^^'^. Under these conditions, the estimation error variance satisfies. 



E \\p{£) - pt-lll^, > V{£, p'^n = Tr G{£, p'^^'' (28) 



where"^ 



is 



gam=l 



7 VP ; cq I Pa7(p*™^) ' 



^07 = vec Oq,^ G C"^ 



and Ceq G R" ^" ^ is part of the unitary matrix in the singular value decomposi- 
tion: vec /„ = USW^ eR''\W=[c Ceq] G R"' 



-xn^ 



This theorem states the for any unbiased estimate of p*'^"^, the variance of the estimate satisfies 
the inequality (28). The power of the result is that it is independent of how the estimate is 
obtained, i.e., no estimation algorithm which produces an unbiased estimate can have an esti- 
mation error variance smaller than that in (28). There is a generalization for biased estimators 
but we will not pursue that here. 

In general it is difficult to determine if any estimate will achieve the lower bound. However, 
under the conditions stated in the above result, the ML estimate, p^^{£), the solution to (17), 
approaches p^^^'^ with probability one, asymptotically as £cxpt increases, and the asymptotic 



^Cramer-Rao bounds previously reported in the literature are not quite correct as they do not include the linear 
constraint Tr p = 1 as is done here. 

'^The vec operation takes the rows of a matrix and stacks them one row at a time on top of each other Two 
useful expressions are \-ec{AX B) = [B^ ® A)-vec X and Tr AX = (vec A^Y'^ec X. 
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distribution becomes Gaussian with covariance given by the Cramer-Rao bound (see §A.2 for 
the covariance expression and [22] for a derivation). 

The one qualifier to the Cramer-Rao bound as presented is that the indicated inverse exists. 
This condition, however, is necessary and sufficient to insure that the state is identifiable. More 
precisely, the state is identifiable if and only if, 

G(£ = 1.,,, = C:,(y.1: is invertible (30) 

Under the condition of identifiability, the experiment design problem can be expressed by the 
following optimization problem in the vector of integers L 

minimize V{i, p*''"^) = Tr pfu«)-i 

subject to E7 ^7 = 4xpt (31) 
integer > 0, 7 = 1,..., ridg 

where £expt is the desired number of total experiments. The good news is that the objective, 
V{^, p*''™), is convex in I [5, §7.5]. Unfortunately, there are still two impediments: (i) restrict- 
ing £ to a vector of integers makes the problem combinatorial; (ii) the lower-bound function 
p^'^^'^) depends on the true value, p^^'^'^. These difficulties can be alleviated to some extent. 
For (i) we can use the convex relaxation described in [5, §7,5]. For (ii) we can solve the relaxed 
experiment design problem with either a set of "what-if" estimates as surrogates for p*'^"'^, or 
use nominal values to start and then "bootstrap" to more precise values by iterating between 
state estimation and experiment design. We now explain how to perform these steps. 



Relaxed experiment design for state estimation 

Following the procedure in [5, §7.5], introduce the variables = ^-y/^xpt, each of which is 
the fraction of the total number of experiments performed in configuration 7. Since all the £^ 
and £cxpt are non-negative integers, each is non-negative and rational, specifically an integer 
multiple of l/fcxpt^ and in addition, Y,-y -^7 = 1- Let P^"" denote a surrogate for p*^"*^, e.g., an 
estimate or candidate value of p^^^^. Using (28)-(29) gives, 

v{i = 4xpt A, p'^^n = 7^^(A, p^n (32) 

*-expt 

Using (29), 

V{X,p''''') = TrG'(A,p™")"i 

G'(A,p™'-'-) = ^G^ip'""'') (33) 

7 

Hence, the objective function V{i, p™") can be replaced with V{X, p'^"") and the experiment 
design problem (31) is equivalent to. 

minimize ^(A, p^""'') = Tr G(A, p^""^"^)-! 

subject to = 1 (34) 

A^ > 0, integer multiple of l/4xpt, 7 = 1, • • • , ''^^cfg 
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The objective is now a convex function of the A^, but it is still a combinatorial problem because 
the are constrained to each be an integer multiple of l/4xpt- If A^ is only otherwise con- 
strained to the non-negative reals, then this has the effect of relaxing the constraint that the 
are integers. As phrased in [5], the relaxed experiment design problem is: 

minimize ^(A, p'''") = Tr (e^ X^G^ip'''''))"^ 
subject to E7 = 1 (35) 
A7 > 0, 7 = 1,.. . ,ncfg 

The objective is convex, the equality constraint is linear, and the inequality constrains are con- 
vex, hence, this is a convex optimization problem in A G R^'^'g . Let A°p* denote the optimal 
solution to (35). Since the problem no longer depends on icxpt, A°p* can be viewed as a distri- 
bution of experiments per configuration.^ Clearly there is no guaranty that ^exptA°P* is a vector 
of integer multiples of l/4xpt- A practical choice for obtaining a vector of integer multiples of 

1/ •^expt is, 

C-^ = round {4xptA°P*} (36) 
If £°P* is the (unknown) integer vector solution to (31), then we have the relations: 

v^(Cxpf , m > v{t^\ p'^n > ^(4xptA°p*, rn m 

The optimal objective is thus bounded above and below by known values obtained from the 
relaxed optimization. The gap within which falls the optimal solution can be no worse than the 
difference between V{P°^'^, p^*^") and ^(4xptA°P*, p^""), which can be computed solely from 
A°P*. If the gap is sufficiently small then for all practical purposes the "optimal" solution is A°p*. 
From now on we will refer to A°p* as the optimal solution rather than the relaxed optimal. 



Performance tradeoff 

The optimal distribution A°p* can be used to guide the elimination of small values of A°p*. For 
example, consider the suboptimal distribution, A^*^*^, obtained by selecting the largest risub out 
of TT-cfg non-zero values of A°p*. Let t^^^ denote the integer vector of configurations, 

Cpt = round {4xptA^""} (38) 

Using (32), the minimum number of experiments so that V{t^^, p^^^^) < Vq is given by, 

= round { V(A^"^ p^n/Vo} (39) 

As Usuh is varied, the graph {rigub, ^expt) establishes a tradeoff between the number of configu- 
rations per experiment versus the total number of experiments such that the lower bound on the 
estimation variance does not exceed the desired value Vq. When rigub = n^g, (39) is identical 
with (36). 

^Caveat emptor: The relaxed optimal experiment design distribution, A°p*, is optimal with respect to the initial 
state p^^", a surrogate for p^'^^°. Thus, A°p' is not optimal with respect to p''"''. This should be no surprise because 
the underlying goal is to find a good estimate of p''"''. 
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The condition number of the matrix G(A°p*, p^"") gives an indication of the identifiability 
of the density matrix p^^^^. A very large condition number means that the linear combination 
of elements of the density matrix associated with the small eigenvalue will be more difficult to 
obtain then those combinations associated with a large eigenvalue. The condition number of 
G(A°P*, p^^^^) is not only affected by the number of experiments per configuration, A°p*, but by 
the configurations themselves. Examining G(A°p*, p^^^^) for different p^""' (surrogates of p*™*^) 
and different configurations can help establish a good experiment design. 

Bootstrapping 

A standard approach used to circumvent not knowing the true state needed to optimize the 
experiment design is to proceed adaptively, or by "bootstrapping." The idea is to use the current 
estimate of the initial state found from (17), then solve (35), and then repeat. The algorithm at 
the A;-th iteration looks like this: 

p{k) = aigmm V (i{k — 1), p) 
X°p\k) = argmmV{X, p = p{k)) (40) 

A 

I{k) = round {4xptA°P*(^)} 

The initial distribution £(0) could be chosen as uniform, e.g. , the same for a not too large number 
of configurations. The algorithm could also start by first solving for a distribution from an initial 
state surrogate. In each iteration we could also vary 4xpt- Although each optimization is convex, 
the joint problem may not be. Conditions for convergence would need to be investigated as well 
as establishing that this method is efficient, i.e., reduces the number of trials. We will not pursue 
this any further here. 

Dual experiment design problem 

Lagrange Duality Theory can provide a lower bound on the objective function in an optimization 
problem as well as establishing optimality conditions often leading to insights into the optimal 
solution structure [5, Ch.5]. In many cases the largest lower bound - the solution of the dual 
problem - is equal to the optimal objective function. The dual problem associated with the 
experiment design problem (35) is, 

maximize {Ti W^/"^^ 

subject to Tr ^^^^(p^""') < 1, 7 = 1, . . . , ncfg (41) 
W >Q 

The optimization variable is TV G C"^"^^"^^^. The above form of the dual is given in [5, 
§7.5.2] for a slightly simpler problem (the are dyads) but is essentially the same. A key 
observation arises from the complementary slackness condition, 

Af (Tr iy°P'G^(p-") - 1) = 0, 7 = 1, . . . , nefg (42) 
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where A°p* is the solution to the primal problem, (35), and W°^^ is the solution to the dual 
problem, (41). Thus, only when the equality constraint holds, Tr 11/°P*G^(p™") = 1, is the 
associated A°p' not necessarily equal to zero. It will therefore be usually the case that many of 
the elements of the optimal distribution will be zero. 

Strong duality also holds for this problem, thus the optimal primal and dual objective values 
are equal, 

Tr A°P'G^(p^"'^'^)^ = (Ti (l^°P*)i/2)^ (43) 

For this problem, a pair (A, W) is optimal with respect to p^"" if and only if: 

A^ = 1 

A^ > 0, V7 

A°P* (Tr iy°P*G^(p^"") - 1) = 0, V7 (44) 

Tr iyG^(p™") < 1, V7 

Tr(E7 A^G^(p™"))"^ = (Ti (iy°Pt)V2)^ 

2.4 Example: experiment design for state estimation 

A schematic of an apparatus for state tomography of a pair of entangled photons specified by 
the quantum state (density matrix) p is shown in figure 2. 




Figure 2: Detection apparatus for two-photon tomography 

The set up has four photon-counting detectors. A, B, C, D. There are four continuous vari- 
able settings for the quarter-wave plates and half-wave plates, i.e., q, h, q', h'. For any settings 
of these parameters one of the detectors in each arm will register a photon. The objective is to 
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determine the optimal settings of these parameters and the number of experiments per setting 
for estimation of the state p of the pair using as data the photon counts from the four detectors. 

Because the photon sources are not completely efficient, the input quantum state actu- 
ally consists of either two or zero photons. The detectors register a or 1 depending on 
whether a photon is incident on them or not. The basis states for the upper arm are therefore: 
|0)elO)/' |0)e|l)/' |l)elO)/- Thcrc is a similar sct for the lowcr arm (modcs /i). 

The firing patterns for an arbitrary setting of the wave plates, assuming perfect detection 
efficiency and no dark counts are given in the table: 



A 


B 


c 


D 





1 





1 





1 


1 





1 








1 


1 





1 


















The probabilities for these patterns are given by 



(45) 



where {i,j, k,£} E {0, 1}, and M^^ is the projector for detector A to register count i and 
simultaneously detector B to register count j. Similarly, is the projector for detector C to 
register count j and simultaneously detector D to register count £. The projectors for A and B 
in the above basis are: 



1 







i)i{h,q) 



1 

71 



sin 2h + i sm2{h — q) 
cos 2h — i cos 2{h — q) 



(46) 



01 
AB 





'^2{h,q) 



cos 2h + i cos 2{h — q) 
— sin 2h + i sin 2(/i — q) 



A similar set of projectors can be written for C and D with the variables h, q replaced by their 
primed counterparts h\ q' . 

The protocol is to measure the probabilities for enough settings of the variables that the 
elements of the two-photon density operator can be estimated. The two-photon density operator 
is the direct product of the one-photon density operator, for which the set of 3 states given above 
forms a basis. The basis states of the two-photon (9 x 9) density operator, p, are: \ijkt) = 
\i)e\j)f\k)g\^)h with k, e e {0, 1} Hence, 

zero photon |0000) 

one photon |0100), |1000), |0001), |0010) 
two photon |0101), |0110), |1001), jlOlO) 
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simulation results: one-arm 



Consider only one arm of the apparatus in figure 2, say the upper arm with detectors (A,B)- 
Suppose the wave plate settings are, 



{h^, |7 = l,...,ncfg } 



(47) 



Assume also that the incoming state always is one photon, never none. Hence, p G C^^^ and 
the projectors are: 

(48) 

with 'ipi,ip2 from (46). Assuming each detector has efficiency ?7, < ?7 < 1 and a non-zero 
dark count probability, 6, < 5 < 1, then there are four possible outcomes at detectors A,B 
given in the following table: 



a 


A 


B 


10 


1 





01 





1 


00 








11 


1 


1 



Following [15, 39] the probability of a dark count is denoted by the conditional probability, 

i^i\o = S (49) 

where 1|0 means the detector has fired "1" given that no photon is present at the detector "0." 
As shown in [15], it therefore follows that the probability that the detector does not fire "0" 
although a photon is present at the detector "1" is given by. 



z/o|i = (l-r/)(l-5) 



(50) 



Here 1 — 77 is the probability of no detection and 1 — 5 is the probability of no dark count. The 
remaining conditional probabilities are, by definition, constrained to obey: 



(51) 



The probabilities for the firing patterns in the above table are thus given by (7) with the following 
observables M, 



Ml 



10,7 



M, 



01,7 



(52) 



Mdo,7 = z/o|ii/o|oM^° + jyo\oT^o\iM', 



rOl 



M- 



11,7 



z/i|iZ/i|oAf^^° + z/i|oZ^i|iM; 



01 
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Numerical computer simulations were performed for two input state cases: 



pure state: pp^^e = 7; 



1 1 
1 1 



1 

^72 



(53) 



mixed state: 



Pmixd 



0.6 
0.2z 



-0.2i 
0.4 



For each input state case we computed A°p* with and without "noise:" 



no noise 



yes noise 



detector efficiency 77 = 1 

dark count probability 5 = 

detector efficiency 77 = 0.75 

dark count probability 6 = 0.05 



(54) 



For all cases and noise conditions we used the wave plate settings: 



hi 



(z-l)(5°), z 
(z-l)(5°), z: 



,10 
,10 



(55) 



100 



Both angles are set from to 45° in 5° increments. This yields a total of ricfg = 10^ 
configurations corresponding to all the wave plate combinations. 

Figure 3 shows the optimal distributions A°p* versus configurations 7 = 1, . . . , 100 for all 
four test cases: two input states with and without noise. Observe that the optimal distributions 
are not uniform but are concentrated near the same particular wave plate settings. These settings 
are very close to those established in [17]. 

To check the gap between the relaxed optimum A°p' and the unknown integer optimum we 
appeal to (36)-(37). The following table shows that these distributions are good approximation 
to the unknown optimal integer solution for even not so large 4xpt for the two state cases with 
no noise. Similar results were obtained for the noisy case. 



^expt 


^(^cxpt Ap^J,p, Ppure) 


^(^cxpt Ajjjjxd, Pmixd) 


^(•^expt (Ppurc), Ppure) 


^(^expt (Pmixd), Pmixd) 


100 
1000 
10,000 


.9797 
.9950 
.9989 


.7761 

.9735 
.9954 



The following table compares the distributions for optimal, suboptimal with 8 angles, uniform 
at the 8 suboptimal angles, and uniform at all 100 angles by computing the minimum number 



16 



of experiments required to obtain an RMS estimation error of no more than 0.01. 





optimal 


1 4-' 1 

suboptimal 


uniform 


uniform 


input state 


round {4xptA°P'} 


round |4xptA'"^| 


(1/8) 1(A''"'^) 


(1/100) 1(A°P'^) 




ricfg = 100 


ricfg = 8 


'^cfg = 8 


ricfg = 100 


Ppure, no noise 


20,308 


20,308 


20,638 


29, 274 


Ppurc, yes noise 


37, 775 


64, 129 


40, 178 


52,825 


Pmixd, no noise 


41,890 


92,471 


69, 750 


64, 780 


Pmixd, yes noise 


61,049 


134,918 


101,425 


94, 385 



For these optical experiments there is significant cost (in time) associated with changing wave 
plate angles and very little cost (in time) for an experiment. As a result, although the uniform 
distribution at all 100 angles does not require a significant increase in the number of experi- 
ments, and in some cases fewer experiments than the suboptimal case, it is very costly in terms 
of changing wave plate angles. The following table shows the wave plate settings and subopti- 
mal distributions from the above table. 







A'^"'^(Ppurc) 


A^"^(Ppurc) 


A'^"'^(Pmixd) 


A™'^(Pmixd) 


h 


Q 


no noise 


yes noise 


no noise 


yes noise 








0.24 


0.23 


0.14 


0.14 


5 








0.11 


0.11 


0.11 


10 








0.04 








15 


45 








0.07 


0.07 


20 


40 


0.01 











20 


45 


0.25 


0.12 


0.19 


0.19 


25 


45 


0.25 


0.13 


0.21 


0.21 


30 


45 








0.07 


0.07 


35 








0.04 








40 








0.09 


0.07 


0.07 


45 





0.24 


0.23 


0.14 


0.14 


45 


5 















This table shows that only a few wave plate angle changes are necessary if the suboptimal 
distributions are invoked. And from the previous table, as already observed, the suboptimal 
settings do not require a significant increase in the number of experiments required to achieve 
the desired estimation accuracy. 
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A°P*^, no noise 



0.3 
.25 
0.2 
.15 
0.1 
.05 















































































1 


■ 











20 40 60 80 100 120 



^pSre' yes noise 



0.15 



0.1 



0.05 
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Figure 3: optimal distributions - one-arm 



18 



simulation results: both arms 



We explore the two arm case under the assumption that two photons are always present at the 
input, thereby excluding the zero photon case.^ The table of detector firing patterns is: 



a 


A 


B 


c 


D 


0101 





1 





1 


0110 





1 


1 





1001 


1 








1 


1010 


1 





1 






The following three 4x4 input states are considered: 

Ppure ® Ppure 

Ppure ® Pmixd (56) 
Pmixd ® Pmixd 

with Ppure ) Pmixd givcn by (53). We use the angles from the one-arm optimal distribution for 
the largest 8 values: 

h e [0, 20, 25, 45] q G [0, 20, 25, 45] 

h' e [0, 20, 25, 45] q' G [0, 20, 25, 45] ^ ' 

This yields a total of ncfg = = 256 wave plate settings. 

Figure 4 shows the optimal distributions A°p* versus configurations 7 = 1, . . . , 256 for all 
three input states. In this case the optimal distributions A°p* are nearly uniform in magnitude. 
The following table compares the distributions for optimal, suboptimal with 25 angles, uniform 
at the 25 suboptimal angles, and uniform at all 256 angles by examining the minimum number 
of experiments required to obtain an RMS estimation error of no more than 0.01. 





optimal 


suboptimal 


uniform 


uniform 


input state 


round {4xptA°P*} 


round {4xptA™'^} 


(1/25) 1(A^"^) 


(1/256) 1(A°P*) 




ricfg = 256 


Ti g — 2 5 


ricig = 25 


ricfg = 256 


Ppure ® Ppure 


81,870 


81,877 


86,942 


115,165 


Ppure ® Pmixd 


135,158 


139,151 


156,219 


226, 234 


Pmixd ® Pmixd 


225, 739 


288, 163 


262,833 


427, 292 



As might be expected it is easier to estimate all pure states than mixed states. The table also 
shows that the optimal solution can be used effectively to guide the selection of suboptimal 
distributions. 



^In an actual laboratory setting it is important to include the zero photon case; it is often that no photon is 
actually present at the input. 
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A°P* for ppurc ® Ppurc 



50 



100 



150 



200 



250 



300 



A°P* for ppure ® Pmixd 



50 



100 



150 



200 



250 



300 



A°P* for pmixd ® Pmixd 



50 



100 



150 

7 



200 



250 



300 



Figure 4: optimal distributions - two-arms 
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2.5 Maximum likelihood state distribution estimation 



A variation on the state estimation problem is to estimate the distribution of a known set of 
input states. The set-up for data collection is shown schematically in Figure 5 for configuration 
7- 



ptruc g 



true \ 



7 




POVM 



a 



1 n. 



1 "-out 



Outcome counts 
L/ trials 



a 



1 , . . . , flout 



Figure 5: System/POVM. 
In this case the input state p^"^^^ is drawn from, 



(58) 



which consists of a set of known states, {p(l), . . . , p(nin)}, with corresponding unknown oc- 



currence probabilities / 



where < f 



< l,Vz and 



true / 



1. The objective is to use the data and knowledge of the input state set to es- 



timate the vector of occurrence probabilities Proceeding as before, assume that the input 
state model is p G where ^{f) = { p{i), f{i) | i = 1, . . . , riin } and where the vector of 
distributions / = [/(I), . . . , f{ni^Y ^ R-"'" to be estimated. In this case the input state can 
be represented by the mixed state. 



pU) = E fiM^ 



(59) 



i=l 



The ML estimate of / is then the solution of the optimization problem: 

minimize L{f) = — Y^a^ logTr Oc,T,p(/) 
subject to /(^) = 1, /(^) > 0, 2 = 1, . . . , 



(60) 



As in the MLE for quantum state estimation (17), the objective is log-convex in the state p(/), 
the state is linear in / G R"'", and the constraints form a convex set in /. Hence, this is a convex 
optimization problem in the variable /. Combining (59) with (60) gives the more explicit form. 



minimize L(/) = - Eq,^ log d^^f 
subject to ESi /(O = 1, /(O > 0, z = 1, 



[Tr 0,^p(l) • ■ - Tr Oo,^p{n,^)f G R"'°, V«,7 



(61) 



Here again as in (25) we could solve for / using the empirical estimate of the outcome proba- 
bilities as in (26): 



minimize Ea,^ (p"^^ - a^^/j 
subject to E£ri/(^) = 1, /(0>0, ^ 



1, . . . , TIjq 



(62) 
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2.6 Experiment design for state distribution estimation 

Let f^^" E R"'" be a surrogate for the true state distribution, /tr"^. Following the derivation of 
(28) in Appendix §A.3, the associated (relaxed) optimal experiment design problem is, 

minimize V^(A, Z™'''') = Tr (e^ A^G^(/™"))~^ 
subject to E7 = 1 (63) 
A7 > 0, 7=1,... ,ncfg 

where 

G,(/-) = (y. ^ R--^^--^ (64) 

with aa-y G R"'" from (60) and where Ccq G R"in^"i"~i is part of the unitary matrix W in the 
singular value decomposition: 1^.^ = USW^, W = [c Ceq] G R"in^"in. 



3 Quantum Process Tomography: OSR Estimation 

We explore two methods of identification for determining the Q-system from data: (i) in this 
section, estimating the OSR in a fixed basis, and (ii) in Section §4, estimating Hamiltonian 
parameters. In either case, the set-up is now as shown in Figure 6. 




POVM 

CM 1 , . . . , flout 



Outcome counts 
Ua-y, i-y trials 

a 1, . . . , Tloxxt 



Figure 6: System/POVM. 

The main difference between state tomography (Figure 1) and process tomography (Figure 
6) is that in the latter case the input state is prepared at specific values, p^, depending on the 
configuration, whereas the Q-system does not depend on the configuration. If the process varies 
with every change in configuration it would be much more difficult to estimate; a model of 
configuration dependence would need to be established. This situation is perhaps amenable 
with Hamiltonian parameter estimation but will not be pursued here. 



3.1 Maximum likelihood OSR estimation 

As already stated, the Krause matrices for modeling the (trace preserving) Q-system in this case 
are not dependent on the configuration 7, specifically, K = { Kk \ k = 1, . . . , k } with k < n'^. 
Using (10), the reduced state model in Figure 6 as a function of is, 

ay{K) = Q{p,, K) = X: Kup.Kl ^ KIK>, = 4 (65) 

fc=l k=l 
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Combining the above with the measurement model (9) gives the probability outcomes model, 

p^^{K) = Tr 0^^{K)p^, 0^^{K) = J2 KM^^Kk (66) 

fc=i 

The log-likelihood function (16) is, 

L{D, K) = -J2 log Tr 0^^{K)p^ (67) 

a, 7 

An ML estimate of K is then a solution to. 



minimize L{D,K) = -Y.aa ^Q^logTr ELi K^Ma^yKkP-/ 
subject to Efe=i K*kKk = In 



(68) 



This is not a convex optimization for two reasons: the equality constraint is not linear in K 
and the objective function is not convex. The problem can be transformed - more accurately, 
embedded - into a convex optimization problem by expanding the Kraus matrices in a fixed 
basis. The procedure, described in [27, §8.4.2], is as follows: since any matrix in C"^" can be 
represented by complex numbers, let 

{5i G C"^" |i = } (69) 

be a basis for matrices in C"^". The Kraus matrices can thus be expressed as, 

Kk = Yl o.kiBi, k= (70) 

i=l 

where the coefficients {aki} are complex scalars. Introduce the matrix X G C"^^"^, often 
referred to as the superoperator, with elements, 

K 

Xij = J2 o-ki^^kj, j = 1, . . . (71) 

fe=i 

As shown in [27], from the requirement to preserve probability, X is restricted to the convex 
set, 

X>0, Yl B:Bj = 4 (72) 
The system output state (65) and outcome probabilities (66) now become, 

a^{X) = Q{p^,X)=Y^ X,,B,p^B* 

»-i=i (73) 



Pa^{X) = Tr Q{p^, X) = Tr XR, 



■ay 



2 2 

where the matrix i?^^ ^ Qn xn j^^^ elements. 
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[Ra^]ij = Tr Bjp^B-Oa-y, ij = l,...,n (74) 
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2 2 

Quantum process tomography is then estimating X G C" ^'^ from the data set D (4). An ML 
estimate is obtained bv solving for X from: 



estimate is obtained by solving for X from: 

= _ V Incr Tr R 

(75) 



minimize L{D, X) = — X^^,^ log Tr XR^^ 



subject to X > 0, Y.ij B*Bj = J„ 

This problem has essentially the same form as (17), and hence is also a convex optimization 
problem with the optimization variables being the elements of the matrix X. Since X = X* E 
Qn xn ^ parametrized by real variables. Accounting for the n'^ real linear equality 
constraints, the number of free (real) variables in X is thus — th?. This can be quite large even 
for a relatively small number of qubits, e.g., for q = [1, 2, 3, 4] qubits, n = 2'^ = [2, 4, 8, 16] 
and n^ — n^ = [12, 240, 4032, 65280]. This exponential (in qubit) growth is the main drawback 
to using this approach. 

The X (superoperator) matrix can be transformed back to Kraus operators via the singular 
value decomposition [27, §8.4.2]. Specifically, let X = VSV* with unitary V G C"''''"' and 
S = diag(si ■ ■ ■ s„2) with the singular values ordered so that si > S2 > ■ ■ ■ > > 0. Then 
the coefficients in the basis representation of the Kraus matrices (70) are, 

aki = y/s^V*i„ k,i = 1, . . . (76) 

Theoretically there can be fewer then Kraus operators. For example, if the Q system is 
unitary, then, 

Q{p) = UpU* ill) 

In effect, there is one Kraus operator, U, which is unitary and of the same dimension as the 
input state p. The corresponding X matrix is a dyad, hence rank X = 1. A rank constraint 
is not convex. However, the X matrix is symmetric and positive semidefinite, so the heuristic 
from [10] applies where the rank constraint is replaced by the trace constraint, 

Tr X < r/ (78) 

From the singular value decomposition of X, Tr X = -^k, and hence, adding the constraint 
(78) to (75) will force some (or many) of the to be small which can be eliminated (post- 
optimization) thereby reducing the rank. The auxiliary parameter 77 can be used to find a tradeoff 
between simpler realizations and performance. The estimation problem is then: 

minimize L{D, X) = — J2a,'y f^a-i logTr XR^^ 
subject to X > 0, Y.^j B*Bj = /„ (79) 
TiX <r] 

3.2 Experiment design for OSR estimation 

Let X''™'' e C"^^"^ be a surrogate for the true OSR, X*™°. As derived in Appendix §A.4, the 
associated (relaxed) optimal experiment design problem is, 

minimize 1/(A, X^"") = Tr (e^ A^G'^(X^""))~^ 
subject to E7 = 1 (80) 
> 0, 7=1,.. . ,ncfg 
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where 



aa-f = vec Ra-f E C"" 



(81) 



and Ccq G C" ^" "is part of the unitary matrix W = [C Ccq] g C" ^" in the singular value 
decomposition of the x n'^ matrix, 

[ai ■■■ ttni] = U [Vnln2 0„2x„4_„2] W* (82) 

with with Gk = wec{B*Bj) G C"^ for k = i + (j — l)ri^, i, j = 1, • • • , The columns of 
Ceq, /.e., the last — columns of W , are a basis for the nuUspace of [ai ■ ■ ■ 0^4]. 



3.3 Example: experiment design for OSR estimation 

Consider the POVM set from the one-arm photon detector (§2.4) using all combinations of the 
following set wave-plate angles, 

h= [0 30 45], q= [0 30 45] 

Assume detector efficiency 77 = 0.75 and dark count probability 5 = 0.05. The set of inputs 
(state configurations) is 

|0), |1), k) = (|0) + \1))/V2, I-) = m+z\l))/V2 

The 9 combinations of angles together with the 4 combinations of input states gives a total of 
36 configurations, 7 = 1,..., ridg = 36. 

Figure 7 shows the optimal distribution of experiments for the 36 configurations using the 
true OSR corresponding to the Pauli basis set ^12/ V^, a^/V^,, ay/y/2, (Tz/v^j. Since the 
system is simply the identity, Q{p) = p, with this basis choice, X^^^'^ = diag(2 0). (No 
knowledge of the system being identity is used, hence, all elements of X^^^'^ are estimated, not 
just the single element in the "11" location.) 

The following table displays the minimum number of experiments required to meet estima- 
tion accuracies of 0.05 and 0.01 for both uniform and optimal distributions. 



accuracy 




^umt 


0.01 
0.05 


856,676 
34, 268 


1,304,561 
52, 183 



Approximately 35% fewer experiments are needed using the optimal distribution. Although not 
dramatic, as in the photon estimation example §2.4, there is a large penalty, in terms of time, 
for changing wave-plate angles. 
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Figure 7: optimal distributions for OSR estimation 



The following Table shows the 22 out of 36 configurations for A°p' > 0.01. 



7 




hry 


?7 


P7 


1 


0.011 








|0) 


2 


0.011 








|1) 


3 


0.073 








k) 


4 


0.068 








I-) 


5 


0.033 





30 


|0) 


6 


0.033 





30 


|i) 


7 


0.071 





30 


1+) 


8 


0.012 





30 


I-) 


12 


0.018 





45 


I-) 


21 


0.068 


30 


45 




22 


0.068 


30 


45 


|i) 


23 


0.063 


30 


45 


k) 


24 


0.141 


30 


45 


I-) 


25 


0.011 


45 





|0) 


26 


0.011 


45 





|i) 


27 


0.073 


45 





k) 


28 


0.068 


45 





I-) 


29 


0.033 


45 


30 


|0) 


30 


0.033 


45 


30 


|i) 


31 


0.071 


45 


30 


1+) 


32 


0.012 


45 


30 


I-) 


36 


0.018 


45 


45 


I-) 
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3.4 Maximum likelihood OSR distribution estimation 



Suppose the Kraus matrices are known up to a scale factor which is related to its probability of 
occurrence, that is, 

Kk = ^Jqk Kk Efc=i QkK^Kk = In .nox 
Et=iqk = l qk>0 k = l,...,K ^ ^ 

One interpretation of this system model is that one of the matrices, say Ki, is the nominal (un- 
perturbed) system, and the others, Kk, k = 2, . . . , n, are perturbations, each of them occurring 
with probability qk. Examples of perturbations include the typical errors which can be handled 
by quantum error correction codes, e.g., depolarization, phase damping, phase and bit flip; see, 
e.g.,[27,Ch.S]. 

The goal is to use the data to estimate the unknown vector of probabilities, q = [qi ■ ■ ■ qnV E 
R'^. Using the system model (83), the model probability outcomes are. 



Pay = Tr May Efc=l Qk KkP-y^l = O-Lq 



Tr MayKip^K^ • • • Tr MayK f^p^K ^ 



e R'^ 



(84) 



The ML estimate of g G R'' is the solution of the optimization problem, 

minimize L{q) = - J2a,y ^ay log a^^q 
subject to Efc=i5fe = l' qk>0,k=l,...,K 



(85) 



This is a convex optimization problem and is essentially in the same form as problem (60) which 
seeks the ML estimate of the input state distribution. 



3.5 Experiment design for OSR distribution estimation 

The formulation here is directly analogous to that of experiment design for state distribution 
estimation §2.6. Let q^^^^^ E R"'" be a surrogate for the true OSR distribution, gt^^^. Following 
the lines of the derivation in Appendix §A.4, the associated (relaxed) optimal experiment design 
problem is, 

minimize V{\, g™'''') = Tr A^G^(g™"))~^ 
subject to = 1 (86) 

A-y > 0, 7 = 1,.. . ,nrfg 

where 

Gyiq-n = e R-^-— (87) 

with aay E R"'" from (85) and where Ccq G R"in><"in~i is part of the unitary matrix W in the 
singular value decomposition: 1^.^^ = USW'^, W = [c Ceq] G R"in><"i". 
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3.6 Example: experiment design for OSR distribution estimation 



Consider a quantum process, or channel, where a single qubit state, p G C^^^, is corrupted 
by a bit-flip error with occurrence probability qs and a depolarizing error with occurrence 
probability qo- The process is described by the quantum operation,^ 



Q{Pi q) = Qi P + Qb XpX + qD 1/2 

qi + qB + qD = '^ 



(88) 



where qj = 1 — (g^ + Qd) is the probability of no error occurring. The probability of observing 
outcome a with the system in configuration 7 is,*^ 



Tr Ma-yQ{p-y,q) 



(1 



Qd 



(89) 



An interesting aspect of this problem is that not all input states p^ lead to identifiability of the 
occurrence probabilities. And this is independent of the choice of POVM Mq^. To see this 
consider the single pure input state. 



p^ = ifjifj*, ifj 

The output of the channel (88) is then, 

qi\a 



a 
b 



\' + qB\b\' + qD/2 
qia*h + qsab* 



qiob* + qBa*h 



(90) 



QiW + (lB\a? 



qD/2 



(91) 



Suppose we knew the elements of Q{^^*, q) perfectly; call them Qn, Q12, Q22- Then in prin- 
cipal we could solve for the three occurrence probabilities from the linear system of equations. 



lap 1/2 



1/2 



ah* a*b 



Qi ' 




' Qn 


Qb 




Q22 


Qd _ 




Q12 



(92) 



R 



If det R = then no unique solution exists; the occurrence probabilities are not identiflable. 
Specifically, det i? = for all a, 6 G C such that. 



(Re ah' 



a 



0, \a\ 



1 

1 



i&r = i 
' - 



(93) 



Y = 







Z = 



1 

-1 



'' X is one of the three 2x2 Pauli spin matrices: X = 

**As shown in [27, §8.3], an equivalent set of OSR elements which describe (88) are, 
y/qi /, y/gs X, ^JqE ^JqE X/2, ^JqE ^72, ^/qE Z/2. Forming the probability outcomes in terms 
of this expansion results in an overparamtrization. 
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Equivalently, det i? = for the following sets of a, 6 G C: 
(a = 0, |6| = 1), (|a| = 1, b = 0), (|a| 
Let the input state be a single pure state of the form 



\b\ = 1/V2 



COS 6 
sin 6 



(94) 



(95) 



Suppose the angle 9 is restricted to the range < 9 < 90°. Using (94), the occurrence proba 
bilities are not identifiable for the angles 9. and respectively, the states tp{9), in the sets, 

1/V2 



9 G {0, 45°, 90°}, ^{9) G 



1 




1/V2 




1 



(96) 



Unfortunately, this excludes inputs identical to the computational basis states |0) or respec- 
tively, -^{9) with ^ = or ^ = 90°. 

We now solve the (relaxed) experiment design problem (86) for occurrence probabilities 
(qi, qb, Qd) = (0.6, 0.2, 0.2) and with the POVM set given by (48). For illustrative purposes, 
we use only 16 of the 100 configurations represented by the wave plate angles (55). Specifi- 
cally, the wave-plate angles are: {h = 0, 15, 30, 45} x {g = 0, 15, 30, 45}. The optimization 
results are presented in the following table which shows the number of experiments per config- 
uration (each of the 16 angle pairs) required to achieve an accuracy of 0.01 for the input states 
corresponding to the angles 6* G {2, 10, 25, 35, 44}. 



configurations 


experiments per configuration 


h 




9 = 2 


^ = 10 


^ = 25 


^ = 35 


9 = U 








62,244 


15,453 


13,386 


31,275 


2,136,560 





15 


1 


1 


1 


1 


1 





30 


1 











1 





45 


1 











1 


15 





1 











1 


15 


15 


1 


1 


1 


1 


1 


15 


30 


73,096 


11,277 


4,765 


9,006 


107,371 


15 


45 


2,080,984 


89,588 


18,598 


14,573 


62,187 


30 





1 











1 


30 


15 


1 











1 


30 


30 


1 











1 


30 


45 


2,080,984 


89,588 


18,598 


14,573 


62,187 


45 





62,244 


15,453 


13,386 


31,275 


2,136,560 


45 


15 


1 


1 


1 


1 


1 


45 


30 


1 











1 


45 


45 


1 











1 


P — 

^-expt 


4,359,563 


221,362 


68,736 


100,705 


4,504,876 



The numerical example shows that for input states close to those states which make the 
problem not identifiable (96), the number of experiments required to achieve the specified ac- 
curacy grows very large. In this case, 9 = 2 and 9 = AA are close to the bad angles and 45, 
and the number of experiments is quite large. 
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4 Hamiltonian Parameter Estimation 



The process of modeling a quantum system in this case begins with the construction of a Hamil- 
tonian operator on an infinite dimensional Hilbert space. Eventually, a finite dimensional ap- 
proximation is invoked in order to calculate anything. (In some cases a finite dimensional model 
is immediately appropriate, e.g., spin systems, [11, Ch.12-9].) The finite dimensional model is 
the starting point here. 



4.1 Maximum likelihood Hamiltonian parameter estimation 

The quantum system is modeled by a finite dimensional Hamiltonian matrix H{t, 6) G C"^", 
having a known dependence on time t, < t < tj, and on an unknown parameter vector 
9 E H"'". The model density matrix will depend on 9 and the initial (prepared and known) 
state drawn from the set of states { pjj"'* e C"^" I /3 = 1, . . . , }. Thus, the density matrix 
associated with initial state p™' is p^{t, 9) E C"^" which evolves according to, 

zhpp = [Hit, 9),pp], p^(0, 9) = f}f (97) 

Equivalently, 

Pp{t,9) = U{t,9)f}fU{t,9r (98) 
where U{t, 9) E C"^" is the unitary propagator associated with H{t, 9) which satisfies, 

ihU = Hit, 9)U, f/(0, 9) = In (99) 

At each of risa sample times in a time interval of duration tf, measurements are recorded from 
identical repeated experiments. Specifically, let { | t = 1, • • • , n^a. } denote the sample times 
relative to the start of each experiment. Let Ua^T be the number of times the outcome a is 
recorded at t^. with initial state p™'* from f^^- experiments. The data set thus consists of all the 
outcome counts, 

D = { n^pr I a = 1, ... , riout, 13=1,..., n-,^, r = 1, . . . , rzsa } (100) 

The configurations previously enumerated and labeled by 7 = 1, ... , ncfg are in this case all the 
combinations of input states pjf'* and sample times r, thus ridg = ninnga- For the POVM Ma, 
the model outcome probability per configuration pair (pjf'*, tr) is, 

Pal3r{9) = Tl MaP(s{tr,9)=TTOar{9)p'^'' 

OaA9) = Uitr,9yMaUitr,9) ^'""^ 

The Maximum Likelihood estimate, 9^^ E R"", is obtained as the solution to the optimization 
problem: 

minimize L{D, 9) = - Ea,p,r ^a/Sr log Tr Oar{0)p'p'^ ^^2) 
subject to 6^ e 9 

where is a set of constraints on 9. For example, it may be known that 9 is restricted to 
a region near a nominal value, e.g., 6 = { 6^ | ||6' — 6'nom|| ^ S }. Although this latter set is 
convex, unfortunately, the likelihood function, L(D, 9), is not guaranteed to be convex in 9. It 
is possible, however, that it is convex in the restricted region 0, for example, if 5 is sufficiently 
small. 
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4.2 Experiment design for Hamiltonian parameter estimation 



Despite the fact that Hamiltonian parameter estimation is not convex, the (relaxed) experiment 
design problem is convex. A direct application of the Cramer-Rao bound to the likelihood 
function in (102) results in the following. 



Hamiltonian parameter estimation variance lower bound Suppose the system 
generating the data is in the model set used for estimation, i.e., (13) holds. For 
£ = [£i- ■ -Crfg] experiments per configuration {p^^^^,t^), suppose 9{£) G R"* is 
an unbiased estimate of 6**"^"° G R"". Under these conditions, the estimation error 
variance satisfies. 



where 



E ||e(£) -r^^r > y{^.o 



ntruc\ 



truc\ — 1 



Tr G{£, r^°) 



ntruo \ 



G R 



nexng 



(103) 



E 



'(Ve Pa/3rW)(Ve Pap AO)) 
PaPriO) 



G R 



ngxng 



g=gtiuc 



(104) 



The relaxed experiment design problem with respect to the surrogate 9 for 9^^^'^ is. 



minimize V{X, 9) = Tr (E;3,r ^prGfSr{9) 

subject to I]/3,T -^/3r = 1 



(105) 



with optimization variables A/jt, the distribution of experiments per configuration U). 
The difference between this and the previous formulation is that there are no equality constraints 
on the parameters. The gradient Pai3T{9) and Jacobian Wee Pa/3Ti9) are dependent on the 
parametric structure of the Hamiltonian H(t, 9). 

4.3 Example: experiment design for Hamiltonian parameter estimation 

Consider the system Hamiltonian, 

H = 9'''"'e {X + Z) /V2, (106) 

with constant control e. The goal is to select the control to make the Hadamard logic gate, 
t^ad = (X + Z)l\f2. If 6'*™'' were known, then the control e = 1/6'*''"'' would produce the 
Hadamard (to within a scalar phase) at time t = n/2, that is. 



U{t = 7T/2) = exp {-z{7r/2)H{e = l/^*"--)} = -t U^,^ 



(107) 
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We assume that only the estimate 9 of 9^^^^ is available. Using the estimate and knowledge of 
the Hamiltonian model structure, the control is e = 1/9. This yields the actual gate att = n/2. 



f/act = exp {-i{7r/2)H{e = 1/9)} = -i U,,^ exp {-i6 {tx/2) f/had} 
5 = ^fuc/^-i 



(108) 



Since the parameter estimate, 9, is a random variable, so is the normalized parameter error 5. 
Assuming the estimate is unbiased, the expected value of the worst-case gate fidelity (1) is given 
explicitly by. 



E min Kf/had^A)* (t/act^A)!' = E cos^ 

U\\=i 



TT 



TT 



(109) 



configurations 



Consider the case where the system is in the model set, the POVMs are projectors in the com- 
putational basis (|0), and the configurations consist of combinations of input states and 
sample times. Specifically, the example problem is as follows: 

model Hamiltonian H{e, 9) = 9 e {X + Z) /V2 
true Hamiltonian H'''''' = 9^'''^e {X + Z) / 
POVM Ml = |0)(0|, M2 = 

sample times (110) 
tk = 5{k-l), k = l,..., 100, 5 = (7r/2)/99 
with pure input state 

^-it = |0) or^i^'t = f/had|0) 

In this example, with a single parameter and a single input state, the optimal experiment design 
problem (105) becomes: 

minimize ^ (A, ^'^"'■'■) = (Er A^^^(^'^""))"^ 

subject to Er -^r = 1 (111) 

> 0, Vr 

The trace operation in (105) is eliminated because the matrix G(3ri9^'^") is now the scalar. 



9r 



E 



PaT{9) 



e R 



(112) 



The solution can be determined directly: concentrate all the experiments at the recording time 
tr where gT{9^^") is a maximum, specifically. 



t°P* = {t. \9s 



9t 



Vs,r} 



(113) 
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The following tables show the minimum number of experiments at the optimal recording time, 
t°P\ in order to achieve 0.01 accuracy (deviation) in the parameter estimate with 6^^" = 9^'^^^. 
The two cases shown are for the two input states with the control set to unity. 





e = 1 






e = 1 






init ^ 




^init 


= f/had|0) 




^surr ^true 


t°PV(7r/2) 


^oxpt 




^surr ^true 


t°PV(7r/2) 


^expt 


0.9 


0.68 


8,876 




0.9 


1.0 


2052 


1.0 


0.61 


10,957 




1.0 


1.0 


2069 


1.1 


0.56 


13,262 




1.1 


1.0 


2052 



(114) 



To make the Hadamard the control update is £ = 1/6 which follows from the (risky) assumption 
that the estimate, 9, is perfectly correct. For any of the true values from the above table (1 14), all 
the estimates have the same accuracy. Hence, the average value of the worst-case gate fidelity 
after the update is approximately, 

2 



E ,min |(f/des^)*(f/a, 

11-011=1 



(o.or 



1 - 0.00024672 = 0.999753 (115) 



The ease of obtaining the estimate by minimizing the negative log-likelihood function can be 
determined by examining the average likelihood function, 

E L{9) =Y.i, Pa^ie'^n logPa,(^) (116) 



a, 7 



which is obtained from (102) and (6). Figure 8 shows plots of normalized'^ E L(9), 0.8 < 
9 < 1.2 for the three true parameter values and corresponding optimal recording times from the 
table in (1 14) with control e = 1 and initial state |0) . In all three cases, over the 9 range shown, 
E L{9) is convex. 

Figure 9 shows plots of normalized E L(9), 0.8 < 6* < 1.2 for the three true parameter 
values and corresponding optimal recording times from the table in (1 14) with control e = 1 and 
initial state f/had|0). For initial state |0), a range of 8876 to 13262 experiments are needed at the 
optimal recording time to achieve 0.01 deviation in the estimate. With the initial states f/had|0) 
the same accuracy only requires about 2000 experiments. This difference can be inferred partly 
by comparing the curvature in the plots in Figure 8 with 9; as they are plotted on the same 
normalized scale. Note the increased curvature of E L{9) in Figure 9 in the neighborhood of 
the true value. 

If we further increase the control effort, say to e = 5, the number of experiments required 
to achieve 0.01 accuracy is significantly reduced as seen in the following table. 

6 = 5 
^i°it = |0) 





^surr ^true 


t°PV(7r/2) 


•^expt 




0.9 


0.93 


98 




1.0 


0.84 


121 




1.1 


1.00 


122 



(117) 



The plots show E L{9) divided by its minimum value, thus normalized to have a minimum of unity. 
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However, Figure 10 shows clearly that the average likelihood function is now significantly 
more oscillatory, and certainly not convex over the range shown. It is of course convex in 
a much smaller neighborhood of the true value. This would require, therefore, very precise 
prior knowledge about the true system. Thus we see a clear tradeoff between the number of 
experiments to achieve a desired estimation accuracy and the ease of obtaining the estimate as 
seen by the convexity, or lack thereof, with respect to minimizing the likelihood function, which 
is the optimization objective. 



^truc 


0.9, t°PV(7r/2) = 0.68 


















1 














^truc 


1. t°PV(7r/2) = 0.61 



1 .OS 
1 OS 
1 .0-4 




gtruc ^ t°P7(7r/2) = 0.56 




e 

Figure 8: Normalized average likelihood function E L(9) with control e = I and input state 
^init ^ |o) for the true parameter values and associated optimal recording times as indicated 
above and given in (1 14). 
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1.1, t°PV(7r/2) = 1 



e 

Figure 9: Normalized average likelihood function E L{6) with control e = \ and initial state 
^imt _ f/j^^j|o) for the true parameter values and associated optimal recording times as indi- 
cated above and given in (1 14). 



^true 


0.9, t°PV(7r/2) = 0.93 
































S 1 

^truc 


1, t°PV(7r/2) = 0.8 


,4 




gtruc ^ t°PV(7r/2) = 1 




Figure 10: Normalized average likelihood function E L{Q) with control e = 5 and initial state 
^imt _ |g^ ^Qj. j-j^g parameter values and associated optimal recording times as indicated 
above and given in (1 17). 
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5 Summarizing Maximum Likelihood Estimation & Optimal 
Experiment Design 

The results presented show that an efficient numerical method based on convex programming 
is possible for optimizing the experiment for state and process tomography. In addition, the es- 
timation of the state and/or process using data from non-continuing measurements is copacetic 
with Maximum Likelihood Estimation. Both the experiment design and estimation work natu- 
rally together and both can be solved using convex optimization methods. 

Maximum likelihood estimation 

The general form for estimating the parameter vr*''"'^ is obtained as the solution to the optimiza- 
tion problem, 



minimize L(7r) = — I]a,7 ^^07 log Pa'yi'^) 
subject to 71 eH 



(118) 



Under the assumption that the system generating the data is in the model set used for estimation, 
the estimate, vr^'", the solution to (118), is unbiased and has the asymptotic variance, 

E IItT^L _^truc||2 ^ _1_ y^^^^^truc) aS 4xpt ^ OO (119) 

where A is the vector of fraction of experiments per configuration, and V{\, 7r'''"°) is obtained 
from the Cramer-Rao inequality. 



Optimal experiment design 

The general form for estimating the configuration distribution A is obtained as the solution to 
the optimization problem. 



minimize V{X,7c) = Tr [Y.-y -^7 G^{Tf 



-1 



subject to 1^7 -^7 = I7 A^ > 



(120) 



where vf is a surrogate for ir^^^^. 

Tables 1 and 2 summarize the class of estimation and experiment design problems, respec- 
tively. 
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objective 




TT e n 


comment 


Hamiltonian 

parameter 

estimation 


Pq^(6') = Tr Oa^{9)p^ 


\\d — 6'nom < 5 


not convex 
in u 

many local 
minima 


state 

estimation 


Pa-yip) = Tr Oa-f P 


Tr p = 1, p > 


convex in p 


state 

distribution 
estimation 


PaM') = (^l-if 
ifla'y)i Tr Ofj-y Pi 


Li Ji = 1, Ji > u 


convex in / 


OSR 

nxeu oasis 


Pa^(A ) = Tr Ra') A 
[R^,\^ = Tr B;M„^5, p^ 


X > 

Y.i,j B* Bj = I 


convex in X 


OSR 

distribution 

(Ki basis) 


Pa^iq) = al^q _ _^ 
{aa^)i = Tr Ma^Kip^K* 


Ei?. = 1, f/z > 


convex in q 



Table 1 : Summary of maximum likelihood estimation. Except for Hamiltonian parameter esti- 
mation, all other cases are convex optimization problems. 



objective 




G^(7r™") 


Hamiltonian 

parameter 

estimation 


p„^(e--) =TrO„^(^-")p^ 








state 

estimation 


Pa^(p^"^0 = TrO„^p^-'- 




^Ea p^(vec 0„^)(vec 0^^)*^ 


Ccq 


state 

distribution 
estimation 


Pa,{m = alj^-" 

{fla'))i Tr Oa'yPi 


Qq 






OSR 

fixed basis 

(Bd 


p„^(X^"") = Tri?„^ X^"" 
[R^^]^^ = Tr B*M^^Bi p^ 




I^Ea pi:;:(vec /2„^)(vec i^a^,)*^ 


Ceq 


OSR 

distribution 
(Ki basis) 


Pa^iq'^'n = al^q^' _^ 
{aaj)i = Tr Ma^Kip^K* 






Ceq 



Table 2: Summary of optimal experiment designs. These are convex optimization problems 
in all cases. The matrix Ccq comes from the associated parameter equality constraints, e.g., 

Tr p-" = 1, Ei /r" = 1. E.J X^^^' B* B^ = I, etc.. 
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6 Iterative Adaptive Control 



Quantum process tomography can be used for adaptive control design as described in [20]. 
Adaptive control systems are in general one of two types [2, 1]: (1) indirect adaptive control - 
use the data to first determine parameters in a system model, then based on the model determine 
the control parameters, and (2) direct adaptive control - use the data to directly select control 
parameters by comparing actual performance to an ideal. Applications of direct adaptive control 
(and learning principles) to quantum systems can be found in [30, 41, 31]. It could be argued 
that the direct approach is simpler in that a system model is not needed. However, this is a 
little deceptive because in effect the closed-loop system is being modeled indirectly via the 
ideal performance. This is made more explicit in the unfalsification/invalidation approach to 
adaptive control, [34], [19]. 



6.1 Indirect adaptive control 

Hamiltonian parameter estimation and associated optimal experiment design can be combined 
in an iterative indirect adaptive control approach as depicted in Figure 11. 





wait 









Control 
Design 



Model 
Parameter 
Estimator 



Controller 



System 



n 



Figure 1 1 : Iterative adaptive control 
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Typical steps in the iteration are: 

control design e^*-* = argopt^ J{e, 6^^^) 



experiment design 



= round(4xpt A») 
A(*) = argminl V(A,e(*\^(*)) I A > 0, ^ A^ = 1 

I 7 



(121) 



collect data 



estimate parameters 



a-y 



a 



1 , . . . , /iout 5 T 1 5 • • • ; ^cfg J' 



arg mm 



< 



The "opt" in the control design step could be to maximize worst-case gate fidelity (1), 



max mm 

U\\=i 



(122) 



where U (t, e, 6) is the propagator arising from the parametric Hamiltonian model. The control 
design step is not necessarily convex. Even if it were, it is not yet known under what conditions 
the complete iterative procedure will converge to the optimal control, or converge at all [20] . 
For example, in the simulations to follow convergence to the optimum is dependent on the initial 
parametrization. The properties of this type of iteration remain an area for further study. 



Example 

The spin-coherent photon transmitter/receiver system proposed in [38, 40] creates quantum 
logic gates by manipulating electron spin via external potentials (gate voltages) to effect the 
g-factors in the semiconductor material in the presence of an external (rotating) magnetic field 
([18] explores g-tensor control without the rotating field). Following [11, III,Ch.l2-9] on mod- 
els of spin systems, an idealized model of the normalized Hamiltonian in the rotating frame of 
a two-qubit gate under "linear g-factor control" is given by. 



H 



H1 + H2 + H 



12 



Hi = l[ei,uJo{Z ^ I2) +ei^uJi{X ® I2)] 

H2 = I h.tUo(/2®^)+ 52x^^1 (/2®X)] 



The design goal is to use the 5 controls (^i^, ei^, e2z, ^2x, ^c) to make the Bell transform. 



bell 



V2 



10 1 
10 1 
10-1 
10-10 
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One of the many possible decompositions of the Bell transform is the following: 

Each operation in this sequence uses only the single qubit and swap "gates" produced by simul- 
taneously pulsing the 5 controls as shown in the following table. 







£2z 


£2x 




At 


gate 
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Figure 12: Pulse control table 
The resulting gate at the final time, tf , is t/bcii to within a scalar phase: 

f/(tf) = e-*? t/bell, = f — + 7^ + ^) 

Suppose the only unknown parameter is uji. Consider the following simplified version of (121): 

control design eW=e(DP), tf(*) = tf(cDf^) 
estimation tDf^"*^^ = argmin ELfcui, £:'•*■') 

where the control design function e{Qi^) represents the pulse design from the above table, and 
where the average likelihood function follows from the description in Section §4.1 with the 
following parameters: 

single initial state p^"* = |0)(0| (/3 = 1) 

POVM Mi = |0)(0|, Af2 = |l)(l| Kut = 2) 

sample times either {tf (a)i), (nsa = 1)} or {U{Qi)/2, U{Qi), {n^^ = 2)} 

Using Hamiltonian parameters {uJq^'^ = 1, tof^^^ = 0.01, c^;*'^"'' = 0.01), Figure 13 shows 
E L(Qi, risa = 1) vs. tDi/cu*^"'^ for sequences of adaptive iterations using the estimate Qi 
obtained from a local hill climbing algorithm, i.e., the local maximum of the average likelihood 
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Figure 13: Iterative adaptation for rtsa = 1 at tf (cDi) for two starting values of 



function is obtained. The estimation is followed by a control using the estimated value in the 
pulse control table. In the two cases shown the algorithm converges to the true value. 

Although not shown, the algorithm does not converge from all initial values of Figure 14 
shows ||t/(tf (cDi), e(cDi)) — ?/dcs||frob vs. estimate cDi/u;f"'^ with the control from the table. The 
function is clearly not convex. The region of convergence for n^^ = 1 and n^a, = 2 sample times 
are shown in blue and green, respectively. The region of attraction is increased for riga = 2. 
These results, of course, are specific to this example and can not be generalized. To re-iterate, 
conditions for convergence, region of attraction, and so on, are only partially understood, in 
general, for this type of iteration [16]. 



6.2 Direct adaptive control 

In a direct adaptive control system no model is posed for the system; ideally only a performance 
measure is available and the control parameters are adjusted to improve the performance. The 
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Figure 14: Regions of convergence. 



adjustment "directions", however, clearly must depend on the shape of the "landscape", other- 
wise, it would not be possible to know how to make the adjustment. In effect then, a model of 
the landscape is either available or is computed intrinsically. Consider the following bipartite 
system whose Hamiltonian depends on the two controls {Ez-, e^)- 

H{e) = HQ{e)®lE + lQ®HE + HQE 

Hq{€) = {e,-l)ujQ,Z/2 + e,ujQ,X/2 

He = ujezZ/2 + ue,X/2 

Hqe = ujQEiX^' + Y^' + Z^') 

The Q-part of the system is assumed to be accessible to the user and the E-part, the "environ- 
ment", is not. The goal is to select the controls to make the Q-system behave as a bit-flip device, 
i.e., the Pauli X matrix. Suppose the Q-system is prepared in the initial state pg^* = |1) (1| and 
a measurement is made at tf = tt/cjq^. of the state |0). hence, ideally, the outcome probability, 
pie), should be unity. Due to the uncontrolled .E-system coupling, however, the goal is to select 
the controls e to make p{e) as large as possible. Under these conditions the outcome probability 
p{e) arises from, 

p{e) = TTMU{tf)poU{uy 
iU{t) = H{e{t))U{t), U{0) = I,0<t<t{ 
Po = f^^'®f^i' 

Of course the system Hamiltonian is not known, only at best the outcome probability would be 
known after enough repetitions of the experiment. 
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Figure 15 shows the landscape of the system with parameters, cjq^ = 1, ^Qx = 0.01, uez = 
1, = 0, cuqe = 0.005, p'g'* = I2/2 and for constant values of the controls over the ranges 
0.96 <e,< 1.04, 0.1 < e,^ < 5.2. 




Figure 15: Two parameter landscape; the maximum probability of 0.96 is achieved with 6^ = 
and Ex = 5.2. 

The landscape clearly has several local maximum values. Thus without some knowledge 
of the landscape or an exhaustive search, it might be difficult to find the global maximum. In 
this two-parameter case, of course, an exhaustive search is not too exhausting. Nevertheless, it 
is clear that any direct adaptive algorithm will face some difficulties. It is also clear that prior 
knowledge can alleviate many of the difficulties, e.g., knowledge of system parameter ranges or 
nominal response which is close to a good outcome, etc. . 

A more in-depth analysis of the landscape for control of quantum systems can be found in 
[32]. There it is shown that for unconstrained time-varying controls, if the system is control- 
lable, then all the local maximum are global. That is, the outcome probability at every local 
maximum is unity and all other extrema give the minimum probability of zero. The complex- 
ity shown here results principally from the fact that the choice of controls is constrained to be 
constant. This points out the importance of constraints either imposed by the physics or by the 
designer in inadvertently making the "wrong" choice of the control structure. In particular, sup- 
pose that the landscape is actually very simple with even possibly one extremum when viewed 
with no constraints on the controls. However, when constraints are imposed, or a new set of 
controls is defined, the landscape may then exhibit structure that was not evident in the freely 
floating original set. An extreme opposite case could be for a choice of variables where the new 
variables actually do hardly anything at all with regard to control action. In this case one would 
conclude that the landscape is totally flat! 

Many of the current "working" adaptive feedback control experiments are operating some- 



43 



where between these two extremes, ranging from having highly constrained knobs to the totally 
wrong knobs. This comment comes from the observation that physical effects spanning a dy- 
namic range in quantum wavelength of about 10^ are being controlled by a single type of laser, 
i.e., the Ti:Sapphire laser, working over a range of about 1-10 on that scale. That is, a domain 
in laser wavelength of 10 (in some units) is split up into about 100 small pieces as controls, and 
those very narrow controls manage everything over a huge dynamic range. The fact that the 
experiments work at all is rather amazing. A guess is that an examination of the landscapes will 
show considerable detail, much of which is likely false structures arising from having highly 
constrained controls. From a positive perspective, as more bandwidth becomes available the 
control landscape will become less complicated and more regular in the sense that more of the 
local optimum values will provide performance close to the global optimum. 

A Appendix 

A.l Worst-case gate fidelity 

From the definition of worst-case fidelity (1), 



|2 



= \{v*ip)*n{v*ip)f 



\J2k=l Zk 



2 



where the last three lines follow directly from (i) the eigenvalue decomposition of the unitary: 

t/dcsf^act = VnV\ V*V = In,^ = diag(^i ■ ■ ■ \ujk\ = 1, (ii) defining x = V*^ G C", 
and (iii) defining Zk = G R. Using the definitions of the vectors (a, b) in (2) gives. 



|(t/desV')*(^actV^)l' 



k=l 



2 

= z^iaa^ + bb^)z 



The QP follows from the relations: 

ll^ll = 1 ^ = v*ij\\ = 1 ^Y.(^k = \xk\^) = 1, Zk > 

k 

A.2 Cramer-Rao Inequality 

The following is the classical form of the Cramer-Rao Inequality. 
Cramer-Rao Inequality[8] 

Let 6^0 G be the true parameter to be estimated from a data set D. Let L{D, Oq) 
be the true negative log-likelihood function of the system generating the data. Let 
9 G R''' be an unbiased estimate of 9q, i.e., E 9 = 9q. Then, the covariance of the 
estimate, 

cov^ = e(^- ^o) {9- 9oY (123) 
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satisfies the matrix inequality, 



cov 9 I 

I F%) 



> 



(124) 



where F{6q) is the Fisher information matrix, 

F{eo)=EVeeL{D,e] 

If ^(6^0) > 0, then (124) is equivalent to, 

cov^ > F(^o)"^ 



e=eo 



(125) 



(126) 



This famous theorem states the for any unbiased estimator, the covariance of the estimate satis- 
fies the inequality (124), or equivalently (126), provided the Fisher matrix is invertible. Usually 
only (126) is given as the theorem: the minimum covariance of the estimate is given by the 
inverse of the Fisher information matrix. The power of the result (124) is that it is independent 
of how the estimate is obtained. The lower bound only depends on the model structure, the 
experiment design, and the information in the data. 



A.3 Derivation of (28) 



Define the vectors r, aa-y G C" as. 



r = vec p, aa-y = vec O 



'a-y 



Then 
and 



Pay = Tr OayP = a* T 



Tr p = 1 <^ b'^r = 1, b = vec /„ 



The next step eliminates the equality constraint = 1 reducing the ri^ unknowns in r to 



n 



1. Since b'^b = n, the SVD of b is. 



b = W 



2 



with unitary W E R"'^"', Partition W = [c Ceq] with Ccq G R"'x"'-i. Then all r satisfying 
b^r = 1 are given by 

r = c/ \fn + Ceq2; 

for all z G C""^\ The likelihood function with the equality constraint (Tr p = 1 or = 1) 
eliminated is then a function only of z, 

L{D, z) = Hay log(c/ ^/n + Ceq^) 

a,7 
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To obtain the Cramer-Rao bound, we first compute, 

^ zzL{D, z) ='Y^ - 73^(C'j^aQ,^)(Cj^aQ,^)* 

a, 7 Pa^\^) 

Using (6), E ria^ = i^Pa-yip^^'^^), P^^'^^ = cj ^fnAr Ceqz''''^'^, and hence the Fisher information 
matrix is, with respect to z. 



F = E z = z*''™) 

a, 7 Fa'yKH ) 



':7 



where the last line comes from the definition of G{i, p*'^"*^) in (29). Let 
unbiased estimate of r^^^^^. Thus, 



COV r = CeqCOV Z > CeqF-^Cjq 



Using var p = var f = Tr cov f and the fact that W is unitary, and hence, C^^Ccq = /n2-i, 
we get, 

var p > Tr CeqF-^Cjq = Tr F'^ 
which is the final result (28)-(29). 



A.4 Derivation of (80) 

Define x, Tq,^ e C"" as, 

X = vec X, ra7 = vec i?^^ 
The likelihood function in (75) can then be written as, 

L{D,x) = - ^ n„^logr*^x 

a,7 

and the equality constraint in (75) becomes. 

Ax = vec /„, A=[ai ■■■ a„4] e C"'^"' 

with Ofc = wec{B*Bj) G C"^ for/c = i + — l)n^, i, j = 1, . . . Perform the singular value 
decomposition A = [/[S 0]l^*, = [C CeJ with C e C"'^"', Ceq G C"'^""""' as given in 
(82). From the definition of the basis functions (69) it follows that 5* = y/n Irfi- Observe also 
that the columns of Coq are a basis for the nuUspace of A. Hence all x satisfying the equality 
constraint are given by, 

X = x + Ceq2;, Vz with X = {1/ \/n)CU*wec In 

The likelihood function with the equality constraint {Ax = vec /„) eliminated is then a function 
only of z, 

L = - "^Tlay log(x + CcqZ) 

a, 7 

This is exactly the same form of the likelihood function in §A.3 after the single equality con- 
straint there is eliminated. Hence, to obtain the Cramer-Rao bound (80) repeat, mutatis mutan- 
dis, the procedure in §A.3. 
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